Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection

نویسندگان

  • Marc Weeber
  • Bob J. A. Schijvenaars
  • Erik M. van Mulligen
  • Barend Mons
  • Rob Jelier
  • C. Christiaan van der Eijk
  • Jan A. Kors
چکیده

Genes are discovered almost on a daily basis and new names have to be found. Although there are guidelines for gene nomenclature, the naming process is highly creative. Human genes are often named with a gene symbol and a longer, more descriptive term; the short form is very often an abbreviation of the long form. Abbreviations in biomedical language are highly ambiguous, i.e., one gene symbol often refers to more than one gene. Using an existing abbreviation expansion algorithm,we explore MEDLINE for the use of human gene symbols derived from LocusLink. It turns out that just over 40% of these symbols occur in MEDLINE, however, many of these occurrences are not related to genes. Along the process of making an inventory, a disambiguation test collection is constructed automatically.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene symbol disambiguation using knowledge-based profiles

MOTIVATION The ambiguity of biomedical entities, particularly of gene symbols, is a big challenge for text-mining systems in the biomedical domain. Existing knowledge sources, such as Entrez Gene and the MEDLINE database, contain information concerning the characteristics of a particular gene that could be used to disambiguate gene symbols. RESULTS For each gene, we create a profile with diff...

متن کامل

Combining multiple evidence for gene symbol disambiguation

Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) b...

متن کامل

An Alkaline Phosphatase Reporter Gene Assay for Induction of CYP3A4 In Vitro

CYP3A4 probably has the broadest catalytic activity of any cytochrome P450. It is a crucial task to test new drug candidates in a reliable system for their ability to induce expression of this enzyme. Firstly, a total of 300 bp core distal enhancer of CYP3A4 XREM region (-7972/-7673) were amplified from human genomic DNA. The PCR product was then ligated into a human secretory alkaline phosphat...

متن کامل

An Alkaline Phosphatase Reporter Gene Assay for Induction of CYP3A4 In Vitro

CYP3A4 probably has the broadest catalytic activity of any cytochrome P450. It is a crucial task to test new drug candidates in a reliable system for their ability to induce expression of this enzyme. Firstly, a total of 300 bp core distal enhancer of CYP3A4 XREM region (-7972/-7673) were amplified from human genomic DNA. The PCR product was then ligated into a human secretory alkaline phosphat...

متن کامل

Creating Algorithmic Symbols to Enhance Learning English Grammar

This paper introduces a set of English grammar symbols that the author has developed to enhance students’ understanding and consequently, application of the English grammar rules. A pretest-posttest control-group design was carried out in which the samples were students in two girls’ senior high schools (N=135, P ≤ 0.05) divided into two groups: the Treatment which received gramm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • AMIA ... Annual Symposium proceedings. AMIA Symposium

دوره   شماره 

صفحات  -

تاریخ انتشار 2003